The article introduces the Pico-Banana-400K dataset, a large-scale collection of 400,000 images designed for text-guided image editing. It aims to address the limitations in existing datasets by providing high-quality, diverse edit pairs generated from real photographs, facilitating advanced research in multimodal image editing techniques. The dataset includes specialized subsets for multi-turn editing, preference research, and instruction summarization.
The article presents the Pico-Banana-400K dataset, which consists of approximately 400,000 text-image-edit triplets aimed at enhancing research in text-guided image editing. It features a variety of edit operations across multiple semantic categories, with evaluations conducted using advanced AI models to ensure high-quality edits. This dataset is designed to support both single-step and multi-turn editing applications.